通常通过过去的选择来告知机器学习中的评估,例如要使用哪些数据集或指标。该标准化可以使用排行榜对平等基础进行比较,但是随着出现更好的替代方案,评估选择变得不佳。这个问题在自然语言生成中尤其相关,该语言需要不断改善的数据集,指标和人类评估以提出确定性的主张。为了使遵循最佳模型评估实践更加容易,我们介绍了GEMV2。新版本的一代,评估和指标基准为数据集,模型和指标开发人员提供了模块化基础架构,以使彼此受益。GEMV2支持40种记录的数据集中51种语言。所有数据集的模型都可以在线评估,我们的交互式数据卡创建和渲染工具使得在Living Benchmark中添加新数据集变得更加容易。
translated by 谷歌翻译
随着近期自然语言生成(NLG)模型的各种应用程序的改进,它变得必须具有识别和评估NLG输出是否仅共享关于外部世界的可验证信息的手段。在这项工作中,我们提出了一个归属于识别的来源(AIS)的新评估框架,用于评估自然语言生成模型的输出,当这种输出涉及外部世界时。我们首先定义AIS,并引入两级注释管道,用于允许注释器根据AIS指南适当地评估模型输出。通过人为评估研究,我们在三个代数据集(会话QA域中的两个中和总结一下,概括地验证了这种方法,表明AIS可以作为测量模型生成的语句是否支持基础来源的常见框架。我们释放人类评估研究指南。
translated by 谷歌翻译
NLP研究人员需要更多,更高质量的文本数据集。收集人类标记的数据集是昂贵的,而通过从诸如维基的网络的自动检索收集的数据集是嘈杂的,并且可以包括不希望的偏差。此外,来自网络的数据通常包括在用于预先rain模型的数据集中,导致无意地交叉污染训练和测试集。在这项工作中,我们介绍了一种用于高效数据集策策的新方法:我们使用大型语言模型来为人类评估者提供种子几代,从而将数据集从写入任务转换为编辑任务。我们使用我们的方法来策划SynthBio - Wikibio的一个新的评估集 - 由描述虚构个人的结构化属性列表组成,映射到自然语言传记。我们表明,我们的虚构传记数据集比Wikibiiiiiiiiii远低,也更加均衡,而且对性别和国籍更加平衡。
translated by 谷歌翻译
In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.
translated by 谷歌翻译
This paper describes a prototype software and hardware platform to provide support to field operators during the inspection of surface defects of non-metallic pipes. Inspection is carried out by video filming defects created on the same surface in real-time using a "smart" helmet device and other mobile devices. The work focuses on the detection and recognition of the defects which appears as colored iridescence of reflected light caused by the diffraction effect arising from the presence of internal stresses in the inspected material. The platform allows you to carry out preliminary analysis directly on the device in offline mode, and, if a connection to the network is established, the received data is transmitted to the server for post-processing to extract information about possible defects that were not detected at the previous stage. The paper presents a description of the stages of design, formal description, and implementation details of the platform. It also provides descriptions of the models used to recognize defects and examples of the result of the work.
translated by 谷歌翻译
Nowadays, distance learning technologies have become very popular. The recent pandemic has had a particularly strong impact on the development of distance education technologies. Kazan Federal University has a distance learning system based on LMS Moodle. This article describes the structure of the OntoMathEdu ecosystem aimed at improving the process of teaching school mathematics courses, and also provides a method for improving the OntoMathEdu ontology structure based on identifying new connections between contextually related concepts.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
目标。借助(子)毫米观测值的大量分子发射数据和詹姆斯·韦伯(James Webb)空间望远镜红外光谱,访问原磁盘的化学成分的快进模型至关重要。方法。我们使用了热化学建模代码来生成各种多样的原行星磁盘模型。我们训练了一个最初的邻居(KNN)回归剂,以立即预测其他磁盘模型的化学反应。结果。我们表明,由于所采用的原行业磁盘模型中局部物理条件之间的相关性,可以仅使用一小部分物理条件来准确地重现化学反应。我们讨论此方法的不确定性和局限性。结论。所提出的方法可用于对线排放数据的贝叶斯拟合,以从观测值中检索磁盘属性。我们提出了在其他磁盘化学模型集上再现相同方法的管道。
translated by 谷歌翻译
神经算法推理的基石是解决算法任务的能力,尤其是以一种概括分布的方式。尽管近年来,该领域的方法学改进激增,但它们主要集中在建立专家模型上。专业模型能够学习仅执行一种算法或具有相同控制流骨干的算法的集合。相反,在这里,我们专注于构建通才神经算法学习者 - 单个图形神经网络处理器,能够学习执行各种算法,例如分类,搜索,动态编程,路径触发和几何学。我们利用CLRS基准来凭经验表明,就像在感知领域的最新成功一样,通才算法学习者可以通过“合并”知识来构建。也就是说,只要我们能够在单任务制度中学习很好地执行它们,就可以以多任务的方式有效地学习算法。在此激励的基础上,我们为CLR提供了一系列改进,对CLR的输入表示,培训制度和处理器体系结构,将平均单任务性能提高了20%以上。然后,我们进行了多任务学习者的彻底消融,以利用这些改进。我们的结果表明,一位通才学习者有效地结合了专家模型所捕获的知识。
translated by 谷歌翻译
这项工作代表了沉浸式数字学习平台的系统面部表达识别和面部压力分析算法的实验和开发过程。该系统从用户网络摄像头检索,并使用人工神经网络(ANN)算法对其进行评估。 ANN输出信号可用于评分和改进学习过程。将ANN适应新系统可能需要大量的实施工作或重复ANN培训。还存在与运行ANN所需的最小硬件有关的局限性。为了使这些限制超过这些约束,提出了一些可能的面部表达识别和面部压力分析算法的实现。新解决方案的实施使得提高识别面部表情的准确性并提高其响应速度成为可能。实验结果表明,与社交设备相比,使用开发的算法可以以更高的速度检测心率。
translated by 谷歌翻译